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Abstract 

A new method of deriving reduced models of Hamiltonian dynamical systems 
is developed using techniques from optimization and statistical estimation. Given 
a set of resolved variables that define a model reduction, the quasi-equilibrium 
ensembles associated with the resolved variables are employed as a family of trial 
^-j , probability densities on phase space. The residual that results from submitting 

these trial densities to the Liouville equation is quantified by an ensemble-averaged 
cost function related to the information loss rate of the reduction. From an initial 
nonequilibrium state, the statistical state of the system at any later time is estimated 
by minimizing the time integral of the cost function over paths of trial densities. 
Statistical closure of the underresolved dynamics is obtained at the level of the value 
function, which equals the optimal cost of reduction with respect to the resolved 
variables, and the evolution of the estimated statistical state is deduced from the 
Hamilton-Jacobi equation satisfied by the value function. In the near-equilibrium 
regime, or under a local quadratic approximation in the far-from-equilibrium regime, 
this best-fit closure is governed by a differential equation for the estimated state 
vector coupled to a Riccati differential equation for the Hessian matrix of the value 
function. Since memory effects are not explicitly included in the trial densities, a 
single adjustable parameter is introduced into the cost function to capture a time- 
scale ratio between resolved and unresolved motions. Apart from this parameter, 
the closed equations for the resolved variables are completely determined by the 
underlying deterministic dynamics. 

Key Words and Phrases: nonequilibrium statistical mechanics, turbulence closure, model 
reduction, statistical estimation, optimization, Hamilton-Jacobi equation 
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1 Introduction 



Complex nonlinear dynamical systems with many interacting degrees of freedom, or many 
coupled modes of motion, are formulated throughout the sciences for the purpose of mak- 
ing reliable predictions about the evolution of system states. But the practical usefulness 
of these high-dimensional models is limited unless they are combined with some kind of 
model reduction. Indeed, what is usually desired from a model of a complex system is 
a quantitative description of some robust, collective behavior. Such a description is not 
necessarily furnished by individual solution trajectories, owing to the generic presence of 
deterministic chaos. Moreover, in realistic problems neither the specification of initial 
states nor the measurement of evolved states is exact. It is generally desirable, there- 
fore, to select some subset of the system's dynamical variables and declare them to be 
resolved, or relevant, variables, and to seek a reduced dynamical description in terms of 
them. For instance, in a spatially-extended system the resolved variables could furnish 
a coarse-grained description of the fully resolved, fine-grained dynamics. Aside from the 
practical considerations that constrain resolution in numerical simulations, the selection 
of these resolved variables is normally determined by two considerations: (1) how initial 
states are prepared and evolved states are observed, and (2) whether it is possible to 
achieve an approximate closure of the dynamics in terms of the resolved variables. In 
most instances there is no perfect selection. Also, there is a competition between these 
two criteria, because the first is more easily satisfied by a few resolved variables, while 
the second is better achieved by many. In this light, the modeler is often confronted with 
the general problem of deriving a reduced system of governing equations for a selected, 
though not unique, set of resolved variables. 

This model reduction procedure is necessarily statistical, since it relegates all unre- 
solved variables to a probabilistic description. A systematic approach to model reduction 
therefore naturally makes recourse to the methods of statistical mechanics, which fur- 
nishes a collection of methodologies for deriving macroscopic behavior from microscopic 
dynamics [2, 3, 7, 23]. But this field has traditionally developed in the narrower context 
of deriving thermodynamical properties of matter from the attributes and interactions of 
a huge number of elementary constituents. Moreover, since reduced equations governing 
statistically-averaged resolved variables for a complex system are analogous to transport 
equations for thermodynamics variables, the most pertinent methods are those of nonequi- 
librium statistical mechanics [2, 24, 40]. Unlike equilibrium statistical mechanics, which 
is a general theory resting on the secure foundation of Gibbs ensembles, nonequilibrium 
statistical mechanics is still an emerging field whose various branches treat particular 
physical processes and whose methods require special mathematical simplifications. As a 
result, there remain many problems in model reduction on which little progress has been 
made because they pertain to phenomena lying outside the range of existing nonequilib- 
rium theory. For instance, many aspects of turbulence modeling suffer from the lack of 
systematic approximations that are analytically justified and computationally tractable, 
and it is mainly for this reason that the design of reliable statistical closures for turbulent 
dynamics remains such a challenging open problem. 

In this paper we propose a new approach to model reduction for complex determin- 
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istic dynamics. In order to maintain contact with the fundamental notions of statistical 
mechanics, we focus on classical Hamiltonian systems. The essence of our approach to 
statistical closure is an optimization procedure in which we seek computationally tractable 
approximations to solutions of the Liouville equation, which governs the exact evolution of 
probability density on phase space [2, 23, 40]. Instead of manipulating exact but compu- 
tationally inaccessible solutions of the Liouville equation, we use trial probability densities 
on phase space that form a parametric statistical model for which the given resolved vari- 
ables are minimal sufficient statistics [6]. With respect to this family of trial densities, we 
seek paths in the statistical parameter space that have minimal lack-of-fit to the Liouville 
equation. Specifically, we minimize the time integral of an ensemble-averaged, quadratic 
cost function of the Liouville residual over those paths of trial densities which connect an 
initial nonequilibrium state to an estimated state at any later time. In this way we obtain 
a closure that is optimal relative to the resolved variables selected for model reduction. 

To complete the derivation of the statistical closure, we deduce the set of differential 
equations governing the evolution of the estimated parameters for the statistical model. 
To do this we introduce the value function for our best-fit minimization problem, which is 
a function on the statistical parameter space that gives the optimal cost of reduction. As 
is known in optimization theory [12, 18], this value function satisfies a Hamilton- Jacobi 
equation with a Hamiltonian that is conjugate to the Lagrangian cost function. The 
desired equations governing the best-fit statistical state are therefore determined by the 
solution propagated by that Hamilton- Jacobi equation. Finally, computationally tractable 
simplifications of them are systematically extracted under some further approximations. 

We restrict our attention here to the relaxation problem for Hamiltonian dynamical 
systems. Apart from mild regularity and growth conditions, we put no restrictions on the 
Hamiltonian defining the underlying complex dynamics or on the set of resolved variables 
defining the model reduction. In this general context we derive reduced equations that 
approximate the evolution toward equilibrium of ensemble-averaged resolved variables 
from incompletely specified, nonequilibrium initial values. Our purpose is to demonstrate 
how an irreversible statistical closure can be inferred from an underlying dynamics that 
is deterministic, reversible and conservative. Our approach to nonequilibrium statistical 
behavior differs from much of the current literature in that we do not assume that the 
microscopic dynamics is a known stochastic processes, nor do we interpose a stochastic 
model between the closed reduced dynamics and the Hamiltonian dynamics. While our 
general method could be adapted to forced and dissipative systems, we do not consider 
these systems in the present paper, nor do we address the much studied question of 
stationary statistical states for those dynamical systems. 

For the sake of definiteness, we limit our presentation to the particular case of the 
best-fit estimation strategy in which the trial probability densities are quasi-equilibrium, 
or quasi-canonical, ensembles [38, 28, 32]. Namely, we use densities 
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The to- vector A = (A\, . . . , A m ) consists of the resolved variables in the reduced model, 
and p eq denotes an equilibrium probability density. The family of densities (1) is pa- 
rameterized by the real to- vector A = (A 1 ,...^" 1 ) G lR m . Here and throughout the 
paper, * denotes the transpose of a real vector or matrix (so that in these formulas, 
A* A = X 1 Ai + • • • + X m A m ); z denotes the generic point in the phase space T n C M 2n , and 
dz is the element of phase volume on T n . For the purposes of our general development, 
the resolved variables, A k , may be any independent, real- valued, smooth functions on T n 
that are not conserved by the Hamiltonian dynamics, and p eq , may be any invariant den- 
sity, such as the canonical Gibbs ensemble. From the perspective of statistical inference, 
p(z; A) is an exponential family on T n with the natural (or canonical) parameter A, and 
the random vector A is a minimal sufficient statistic for this family [6]. 

In the reduced model the ensemble mean of A with respect to p constitutes the 
macrostate 

a = {a 1 ,...,a m ) = (A\p) = / A(z) p(z; A) dz . 

The parameter vector A is dual to the macrostate vector a, and the convex function 
determines a one-to-one correspondence between A and a through a = d(f>/d\. The choice 
of the statistical model (1) defines a configuration space, M m , for the reduced model, 
the generic point of which is A. The desired reduced dynamics is therefore characterized 
either by the evolution of A or equivalently by the evolution of a. This point of view is 
familiar from the information-theoretic approach to nonequilibrium statistical mechanics 
[21, 22, 23, 38, 14, 28]. 

Our goal is to estimate the macrostate vector a(ti) at any time t\ after an initial time 
to at which a nonequilibrium initial state is specified. For simplicity in this Introduction, 
we assume that the initial density, p(z,t ) = p(z; Ao), is a quasi-equilibrium ensemble of 
the form (1) corresponding to a specified A(to) = Ao, with a(t ) = ao; in the body of the 
paper we give a formulation in which the initial statistical state is incompletely specified. 
The exact density p(z, t) at later times t > t solves the Liouville equation, which can be 
written in exponential representation as 

~dT + Llogp = °' 

where L = {-,H} denotes the Liouville operator associated to the Poisson bracket {•, •}. 
As p evolves it loses its quasi-equilibrium form and develops a more intricate structure 
than the trial probabilities p. The key to an effective statistical closure with respect to 
the given TO-vector A of resolved variables is therefore to devise a good approximation 
to p within the quasi-equilibrium family p. To this end, we calculate the residual with 
respect to the Liouville equation of a trial density p{z\ X(t)) along any smooth path X(t) 
in the statistical parameter space M m : 

R = + Llogp = X*(A-a) + X*LA . 

at 

This Liouville residual R = R(z;X,X), or lack-of-fit of the trial density p(z;X) to the 
Liouville equation, has an interpretation as the instantaneous rate of information loss 
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within the statistical model. We therefore base our estimation strategy for a(ti) on 
minimizing R in an appropriate norm over the time horizon of estimation, to < t < t\. 
Namely, we evaluate the following time-integrated, ensemble-averaged cost functional: 

r C{\X)dt = \ f 1 ((P X R) 2 |p(A)> + 6 2 ((g AJ R) 2 |p(A)) dt (2) 

Jt 2. J t 

where P\ is the orthogonal projection of L 2 (T n , p(A)) onto the span of the resolved vector 
A, and Qx = I — P\ is the complementary projection. The constant e G (0, 1] is an 
adjustable parameter in our closure scheme, which assigns relative weights to the resolved 
and unresolved components of the Liouville residual. 

The integrand in the cost functional (2) may be viewed as a Lagrangian C(X, A) on the 
configuration space M m . In that light our best-fit strategy for closure is determined by 
an analogue to the classical principle of least action. We therefore use Hamilton- Jacobi 
theory [1, 18, 27, 16] to deduce the governing equations for our closed reduced dynamics. 
That is, we introduce the value function (or principal function) 

rti 

v(\i,ti) = min / £(A, A) dt subject to A(£o) = Ao , (3) 

A(ti)=Ai Jt 

which associates an optimal lack-of-fit to an arbitrary terminal state Ai at a terminal 
time ti. This minimization is over admissible paths X(t), to < t < ti, that connect the 
specified initial state A to Ai. The value function, v = v(X,t), satisfies the associated 
Hamilton- Jacobi equation 

dv dv 

in which H(X,p) is the Legendre transformation of £(A,A). The conjugate canonical 
variable, p, has an interpretation as the irreversible part of the flux of the macrostate 
a = (A\p), in the sense that 

" = ff = (^1^ = j t (A\p) ~ (LA\~p) . 

The Hamilton- Jacobi equation propagates the value function forward in time from the sin- 
gular initial condition that all paths emanate from Ao- We designate the best-fit estimate 
of the statistical state at any time t to be the minimizer X(t) of v(X, t); the corresponding 
best-fit macrostate is therefore a(t) = (A\p(X(t))) = d<p / dX{X{t)) . 

The reduced dynamics governing the evolution of the best-fit macrostate can be de- 
duced from the minimizing property of X(t) and the Hamilton- Jacobi equation; the main 
result is 

(1(> (LA | p(A)> - e 2 D^(X) \D 2 v(X, t)] _1 ^(A) , (5) 



dt 



dX 



where w(X) = ( [Q\L(X*A)} 2 \ p(X) }, and D 2 4> and D 2 v are Hessian matrices with respect 
to A. The right hand side of this equation separates into an reversible term, which has 
a standard form, and an irreversible term scaled by e 2 , which has a novel form. The 
irreversible part of the flux in (5) has a generalized gradient structure with a potential- 
type function e 2 w(X) that quantifies the influence of unresolved fluctuations. Besides 
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being scaled by e 2 the irreversible term also depends on e implicitly through the value 
function v = v(X, t; e). 

The defining minimization over paths on the entire time interval of estimation par- 
tially compensates for the fact that the trial densities are memoryless, quasi-equilibrium 
ensembles. This feature of the best-fit closure is its primary advantage as a computational 
method, since no trajectories of the underlying Hamiltonian need be computed. However, 
this feature also requires that the scale factor e be introduced into the cost function and 
be adjusted to give the correct dissipation rate in the reduced dynamics. It is not surpris- 
ing that such adjustment is needed in light of known projection methods [40, 34, 8, 9], 
which furnish various expressions for the dissipative term in (5). These expressions in- 
volve time convolutions with respect to a memory kernel for autocorrelations with respect 
to a projected Liouville propagator, essentially e tQL in our notation. The cost functional 
(2) incorporates a minimal representation of the memory of unresolved fluctuations by 
means of a single real parameter e. This approach dispenses with the computation of any 
memory kernel at the expense of an adjustable parameter that must be tuned empirically. 

A further approximation of the reduced dynamics (5) is desirable because evaluation of 
the value function requires solving the Hamilton- Jacobi equation, which is computational 
burdensome except when m is small. Accordingly, we derive a more explicit closure scheme 
for the pair (A(t),M(t)), where M(t) = D 2 v(X(t),t) denotes the Hessian matrix of the 
value function at the best-fit state. To do so we make the local quadratic approximation 
v(X,t) « u(A(f),t) + (1/2)[A - X(t)]*M(t)[X - X(t)] in a neighborhood of A, and we find 
that the Hamilton- Jacobi equation reduces to a coupled system for A and M, in which 
M satisfies a matrix Riccati differential equation with coefficient matrices that depend 
on A. The local quadratic approximation used to derive these closed reduced equations 
from the general best-fit theory has the character of a truncation of a closure hierarchy 
at the second order. In this sense it is a rational method of acquiring a computationally 
tractable statistical closure that could be applied far from equilibrium. 

We also show that, in the near equilibrium regime under the standard linear response 
approximations [7, 40] , this local quadratic representation of the value function is valid 
globally and the matrix Riccati equation has constant coefficients. Thus, for near equilib- 
rium behavior we obtain a simpler system of closed reduced equations, which is compara- 
ble to the phenomenological equations of linear irreversible thermodynamics [13, 31], but 
with a time-dependent matrix of transport coefficients that is determined by the solution 
to the Riccati equation. 

Our approach seems to have no antecedent in the literature. While we build on the 
ideas of Jaynes [21, 22] and Zubarev [38], as well as later workers [36, 28, 39], in that 
we utilize quasi-equilibrium densities, which maximize entropy subject to instantaneous 
macroscopic constraints, our strategy of minimizing a time-integrated cost function for 
the Liouville residual over paths of these densities appears to be new. Furthermore, 
our closed reduced equations have a different format from that of other theories. In 
particular, the differential equation for the estimated statistical state X(t) is coupled to 
a differential equation for the Hessian matrix, M(t), which quantifies the information 
content in the estimate. In this sense our theory simultaneously estimates the evolving 
macrostate and quantifies the uncertainty inherent in that estimate. The format of our 
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closed reduced equations, therefore, resembles that of continuous-time optimal filtering 
theory [4, 11, 29, 30], and in the near equilibrium regime there is a close connection 
between our reduced equations and the Kalman-Bucy filter. Unlike the standard filtering 
problem, however, our statistical closure updates the reduced model by accessing the 
unresolved components of the deterministic equations of motion themselves, rather than 
assimilating some measurements. 

The outline of the paper is as follows. We set the background for our approach 
in Section 2, and then we present the best-fit closure scheme in Sections 3 and 4. This 
theoretical development applies to nonequilibrium states that may be far from equilibrium. 
In Sections 5 and 6 we specialize our general formulation to the near-equilibrium version 
of our theory. In the case when all the resolved variables are symmetric under time 
reversal, we show that our near-equilibrium closure takes a form similar to that of linear 
irreversible thermodynamics. In Section 7 we give a heuristic analysis that supports the 
physical interpretation of the adjustable parameter e as a time-scale ratio. 

In a companion paper [33] we address the computational implementation of the gen- 
eral methodology developed in this paper, and we present comparisons of its predictions 
against fully-resolved numerical simulations for a particular Hamiltonian system. 

2 Background 

We consider a general Hamiltonian dynamics in canonical form 

f t = JV z H{z) with J = ( ° x £ ) , (6) 

where z = (q,p) denotes a generic point in the phase space T n = M 2n , where n is the 
number of degrees of freedom of the system. We place no special restrictions on the 
Hamiltonian H other than it be a smooth function on T n with a natural growth condition 
at infinity: H{z) > b\z\ 2 — c, with b > 0, c > 0, for large \z\. Most of what follows holds 
for noncanonical Hamiltonian systems, but we will restrict our development in this paper 
to classical canonical systems for the sake of clarity. 

We denote the phase flow for the Hamilton equations (6) by $(£) : T n — > T n , so that 

z ( tl ) = - t )(z(t )) for all tx > t , 

where z(t) is any solution of (6). This deterministic phase flow $(£) is a volume-preserving 
diffeomorphism of T n for all t, by Liouville's theorem. The invariant 2n-volume on T n is 
denoted by dz. 

We are interested in estimating or approximating the macroscopic behavior of a few 
dynamical variables rather than following the details of the microscopic dynamics (6) 
itself. We therefore suppose that some resolved, or relevant, dynamical variables are 
selected, and we seek a statistical closure in terms of these variables. We assume that each 
dynamical variable is a smooth real- valued function on T n , and that the set Ai, . . . , A m 
is linearly independent. We assemble them into the resolved vector A = (A±, . . . , A m ). In 
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principle there is no restriction on m, but in practice the number m of relevant variables 
should be small compared to the dimension n of the phase space. 

The quality of any choice of resolved variables is determined by the ability of the result- 
ing reduced model to approximate the collective behavior of the system. The assessment 
of any particular selection of resolved variables therefore first requires the formulation 
of a closure scheme. Nonetheless, in most practical problems there will be some natu- 
ral choices of A in terms of which it is reasonable to expect a good approximation. We 
mention just two such physical systems: molecular dynamics and dispersive wave turbu- 
lence. For a coupled system of particles in which a few tagged particles interact with 
many particles constituting a "heat bath," the canonical variables of the tagged particles 
furnish natural resolved variables for reduction [17, 19]. For a nonlinear wave system with 
many interacting modes, a kinetic description of the power spectrum (perhaps in some 
selected bands) is a customary reduction for "weak turbulence" theory [37, 5]. Even in 
these cases, though, improved approximations may result from expanding or otherwise 
modifying the set of resolved variables. In light of considerations of this kind, we proceed 
without putting any special restrictions on the resolved vector A. 

The evolution of any dynamical variable, F, resolved or not, is determined by the 
equation 

£ = {**>. m 

where {F, H} = (VF)*JVH is the Poisson bracket associated with the canonical Hamil- 
tonian structure. Indeed, the statement that (7) holds for all smooth functions F on Y n 
is equivalent to the Hamiltonian dynamics (6). Fundamentally, the problem of closure in 
terms of the resolved variables Ai, . . . ,A m arises from the fact that, except under very 
special circumstances, the derived variables A\ = {Ai, H}, . . . , A m = {A m ,H} are not 
expressible as functions of Ai, . . . , A m . For instance, in the exceptional case when A = QA 
identically on r n for some mxm constant matrix Q, a deterministic closure is immediate 
from (7). We are however interested in the generic case, and hence we adopt a statistical 
description defined by that evolving probability measure p(dz, t) on the phase space T n 
which is induced by the phase flow; namely, 

/ p(dz,t 1 ) = / p(dz,t ) for all Borel subsets B C T n , and any h>t . 

J<S>(ti-t )(B) Jb 

We consider only probability measures, p(dz,t) = p(z,t)dz, having densities p that are 
smooth in z and t, because all trial densities in our reduced model have this regularity. 
Then the propagation of probability by the phase flow is governed by the Liouville equation 

do 

+ Lp = inr„xiR, (8) 

in which we introduce the Liouville operator L = {-,H}. Given a density p(z,t ) at an 
initial time to, (8) completely determines the density p(z,t) at any later time t, denoted 
formally by p(-,t) = e -('-'o) L p(. ; t ). The statistical mean of any dynamical variable F 
at time t is given by 

(F\p(t)) = f F(z)p(z,t)dz = f Fo<S>(t-t )(z)p(z,t )dz, 
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where the first equality defines our notation for expectation. In particular, the evolu- 
tion of the statistical average of the relevant vector (A\p(t)) is determined by the exact 
solution of (8). From the point of view of a numerical computation, however, solving 
the Liouville equation impractical because it requires the simulation of an ensemble of 
exact trajectories for the complex dynamics (6). For this reason we resort to the following 
natural approximation procedure. 

We seek to approximate the exact density p{t) by a family of trial densities, p(t), that 
have a simple and tractable analytical form. For reduction relative to a given resolved vec- 
tor A, a natural choice is supplied by the so-called quasi-equilibrium, or quasi-canonical, 
densities [21, 22, 38, 32] already displayed in (1). A standard motivation for using the 
family of densities (1) to construct a statistical closure is that each member of the family 
maximizes information entropy subject to the mean value of the resolved vector; that is, 
p solves 

maximize S(p) = — p log — dz subject to / Apdz = a, / pdz — 1. 

"T n p e q "T n -T n 

From the perspective of information theory [10, 25], p(z, A) is the least informative proba- 
bility density relative to the equilibrium density p eq that is compatible with the macrostate 
vector 



a = (A\p) = / A(z)p(z;X)dz. (9) 

The parameter vector A e M m then consists of the Lagrange multipliers for the vector 
constraint (9), and there is a one-to-one correspondence between A and a given by 

dd> , ds j . 

a -i' A = -&' < 10 > 

where s(a) = S(p) denotes the entropy of the macrostate a. This correspondence is a 
convex duality, and —s(a) is the convex conjugate function to 0(A). 

In this formulation we fix an equilibrium density p eq on T n and construct the family of 
quasi-equilibrium trial densities p relative to it. An alternative formulation is to include 
the Hamiltonian H among the resolved variables, by forming an augmented resolved vector 

A = (A , A-y, . . . , A m ) with A = H. Then the trial densities p = exp[A A H V \ m A m — 

0(A)] have the augmented parameter vector A G R m+1 and are respect to phase volume dz. 
This alternative leads to a parallel theory. Our formulation in terms of a non-conserved 
resolved vector A focuses the discussion of nonequilibrium concepts and facilitates the 
derivation of the near-equilibrium theory via the linear response approximation. 

In physical terms the quasi-equilibrium description establishes a nonequilibrium statis- 
tical mechanics without memory, in the sense that p(z, A(t)) depends only on the instan- 
taneous macrostate a(t), not on its history, a(t') for t' < t. Such a memoryless description 
might be justified when there is a wide separation between the time scale of evolution of 
the resolved vector, A, and the time scale of conditional equilibration of the unresolved 
variables. In the limit of an infinite time-scale separation, an instantaneous statistical 
closure with respect to these quasi-equilibrium densities is obtained by imposing the A- 
moment of the Liouville equation, namely, 

j t (A{z)\p{ Z] \{t))) = (LA(z)\p(z;X(t))). 
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But the resulting statistical closure is exactly entropy conserving, as the following straight- 
forward calculation shows: 

j f S(p) = -A*^ = jf LX*Aexp[X*A-<f ) (X)]p eq dz =0. 

Thus, the combination of a quasi-equilibrium ansatz and an instantaneous moment closure 
results in a reversible reduced dynamics [14, 20]. The inability of this adiabatic closure 
to capture the dissipation, or entropy production, of the actual nonequilibrium statistical 
behavior is a serious defect, making it useful only for certain short-time phenomena. 

In much previous work [38, 39, 28, 35, 20], the remedy for this defect has been to 
include memory effects into the relevant probability densities by replacing X(t)*A with 
time- weighted averages of the dynamical variables, / °° X(t — r)* Ao $(— r) w{r) dr. Clo- 
sure is then obtained by taking instantaneous A-moments with respect to these memory- 
dependent densities. The resulting reduced equations, however, depend upon the weight- 
ing function w, for which there is no universal choice. Moreover, they involve convo- 
lutions over time, and therefore require the evaluation of various correlations of A and 
LA = {A, H} over a range of time shifts r. The statistical closures derived in this manner 
are consequently difficult to justify theoretically and expensive to implement computa- 
tionally. 

The new approach that we propose in the next section is fundamentally different. 
Rather than use memory-dependent densities and moment closure, we retain the quasi- 
equilibrium densities as natural trial densities in a parametric statistical model. With 
respect to this model we obtain closure by best-fitting trajectories of trial densities to the 
Liouville equation over the entire time interval from when the initial statistical state is 
specified to when the evolved state is estimated. 



3 Formulation of best-fit closure 

As outlined in the Introduction, we quantify the lack-of-fit of the quasi-equilibrium trial 
probability densities (1) to the Liouville equation (8) over an interval of time t < t < t\ 
by a cost functional 

E[A;t ,ti] = [ h £(A, A) dt . (11) 

Jt 

In this formulation the cost functional is analogous to an action integral for a Lagrangian 
£, which is a function of A and A = dX/dt, and the configuration space is the parameter 
space lR m for the statistical model (1). To construct an appropriate (running) cost func- 
tion £(A,A), we introduce the residual of the log-likelihood, logp, of the trial densities 
with respect to the Liouville operator: 

R(-; A, A) = ^ + L) logp(.; X(t)) = X(t)*(A - a(X(t))) + X(t)*LA . (12) 

There are two related expressions for this Liouville residual R that reveal its significance 
in the statistical closure problem. 



10 



First, for any z G T n and any smooth parameter path X(t), we examine the log- 
likelihood ratio between the exact density and the trial density after a short interval of 
time At: 

-(At)L 5 / 

log ^. A( f +At)) = -(^t)R(z;X(t),X(t)) + 0((At) 2 ) as At -+ . 

This expansion shows that, to leading order locally in time, —R(z; X(t), X(t)) represents 
the information in the sample point z for discriminating the exact density against the 
trial density [25] . By considering arbitrary smooth paths passing through a point A with 
a tangent vector A, we may consider R(z; A, A) evaluated at z as a function of (A, A) in 
the tangent space to the configuration space, and we may interpret it to be the local rate 
of information loss in the sample point z for the pair (A, A). 

Second, for any dynamical variable F : T n — > M, we evaluate the F-moment of the 
Liouville equation with respect to the trial densities along a path X(t) and we obtain the 
identities 

j t (F\p(X(t)))-(LF\p(X(t))) = (F\(d/dt + L)p(X(t))) = (FR\p(X(t))) 

where the second equality follows directly from the definition (12). Thus we find that, 
while an exact solution p(t) of the Liouville equation satisfies the identities, d/dt(F\p(t)} — 
(LF\p(t)) = 0, for all test functions F, a trial solution p produces a departure that 
coincides with the covariance between F and R. This representation of R furnishes a 
natural linear structure for analyzing the fit of the trial densities to the Liouville equation. 

In light of these two interpretations of the Liouville residual R, we proceed to measure 
the dynamical lack-of-fit of the statistical model in terms it. To do so, we consider the 
components of R in the resolved and unresolved subspaces. At any configuration point 
A G M m , let Pa denote the orthogonal projection of the Hilbert space L 2 (T n ,p(A)) onto 
the span of the functions A\ — ai(A), . . . , A m — a m (X), and let Q\ — I — P\ denote the 
complementary projection; specifically, for any F G L 2 (r„,p(A)), 

P X F = (F(A - a(X))* | p(A)> C^A)- 1 ^ - a(A)) . 

where 

C(X) = ((A-a(X))(A-a(X)y\p(X)). (13) 
We take the cost function in (11) to be 

£(A, A) = \({P,Rf | p(A)> + ^ ((QxR) 2 1 p(A)> (14) 

for a constant e G (0, 1]. This lack-of-fit norm is of mean-squared type, but with an 
adjustable parameter e that controls the weight given to the unresolved component of the 
residual versus the resolved component. 

By duality, £(A, A) can be characterized in terms of test functions F as follows: 

2£(A,A) = max{ (Fit>|p> 2 : ((P X F) 2 \ p) + e~ 2 ((Q X F) 2 \ p) < 1, FeL 2 (T n ,p)}. 
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This dual form of the norm highlights the fact that C weights departures in the unresolved 
dynamical variables less than departures in the resolved variables, depending on e. In 
particular, when e is small the constraint in this maximization gives preference to those 
F which have a relatively small component in the unresolved subspace. Thus, as the 
adjustable parameter e is decreased, the contribution of the unresolved variables to the 
cost function C is correspondingly decreased; and, in the limit as e — > 0, it vanishes 
entirely In Section 7 we explain how e is related to time scales and memory effects. 

Before presenting the optimization problem that defines our best-fit closure, we first 
exhibit the Lagrangian £ in a more explicit form. The Liouville residual has zero mean, 
(R\p(X)) = 0, and its orthogonal components are given by 

P X R=[X- C(A)- 1 (LA|p(A)) }*(A - a) , Q X R = X*(Q X LA) . 

The calculation of P\R employs the string of equations, 

(LA | p) = ((LA)e x * A -«>W | peq) = - a)Le x * A ^ \ p eq ) = -((A - a)L(X*A) \ p) 

in which the anti-symmetry of the operator L with respect to p eq is used. Hence the 
Lagrangian cost function can be written in the explicit form, 

£(A,A) = \[X-C{X)-\LA\p{X))yC{X)[X-C{X)-\LA\p{X))] + ^A*D(A)A, (15) 
where 

D(X) = ((Q x LA)(Q x LA*)\p(\)) . (16) 

By analogy to analytical mechanics, one may regard the first member in (15) as the 
"kinetic" term and the second member as the "potential" term. The kinetic term is a 
quadratic form in the generalized velocities A with positive-definite matrix C(A). In fact, 
this matrix is the Fisher information matrix [25, 6] for the exponential family (1), having 
components 

It defines a natural Riemannian metric, ds 2 = J2i,j Cij(X)dX 1 dX\ on the configuration 
space M m . The potential term equals e 2 times the function 

w(X) = h*D(X)X = i([Q A Llogp] 2 |p(A)>, (17) 

which we call the closure potential because it embodies the influence of the unresolved 
variables on the resolved variables. Of course, these mechanical analogies are not literal. 
Indeed, even though we refer to £ as a "Lagrangian," it has the units of a rate of entropy 
production, not of an energy, and it is a sum of two positive-definite terms, not a difference. 
For a given initial time t , we define the value function [4, 12] 

v(X l ,t 1 ) — min v (X(t )) + E[A;* ,ti], (18) 

A(t)=Ai 
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in which the minimization is over all regular paths A(t) on the interval t$ < t < t\ 
that terminate at an arbitrary (admissible) state Ai G M m at time t±. The optimization 
problem in (18) is the foundation of our best-fit closure scheme. The value function (18) 
quantifies a total lack-of-fit of the statistical state Ai at time t\ with respect to evolution 
from an incompletely specified statistical state at time to- As explained above, the second 
member of the objective functional in (18) is a dynamically intrinsic norm on the Liouville 
residual for a path of densities p(X(t)) over the time interval, t < t < t±. The first member 
of the objective functional determines the initial value function, v (X), which is given data 
in the optimization problem. In the special case when initial density, po, is specified to be 
a quasi-equilibrium density, p(-;Ao), for a known m-vector Ao, the initial value function 
t>o is singular; namely, 

f +oo A 7^ A 
10 A = A 



In this case the first member in the objective functional in (18) can be dropped and the 
constraint A (to) = X imposed instead. This singular initial condition can also be obtained 
as the limit of penalty functions i>o(A) = b(X — Xq) 2 /2 as b — > +oo . In the general case 
when there is some uncertainty in the initial statistical state, A (to) is free and is penalized 
by the nonsingular initial value function v , which represents the degree of specification 
of the initial statistical state. 

It is a fundamental fact from the calculus of variations that such a value function 
satisfies a corresponding Hamilton- Jacobi equation [27, 18, 16]. Since C is quadratic 
and convex in A, the required Legendre transformation can be calculated explicitly. The 
conjugate canonical variable is 

= = C(X)X- (LA\p(X)), (19) 
oX 

and the Hamiltonian associated with £ is 

• 3C 1 

n(X, p) — A*— r- — C — -p*C~ l p + {LA | ~p{X)YC- V - e 2 w(A) , (20) 
aA 2 

where we recall the closure potential w(X) defined in (17). The value function in (18) 
therefore satisfies the Hamilton- Jacobi equation 

dv dv 

— + U{X,—) = 0, fort>t , with v(X,t Q ) =v (X). (21) 

At this juncture of our development we have a complete formulation of the desired 
model reduction scheme. Given a resolved vector, A, and an equilibrium density, p eq , 
the lack-of-fit Hamiltonian (20) is entirely determined up to a choice of the parameter 
e G (0, 1], and the Hamilton- Jacobi equation (21) propagates the value function v = 
v(X,t) forward in time from any suitable initial value function i>o(A). We may view this 
Hamilton- Jacobi equation as the appropriate contraction of the Liouville equation relative 
to our statistical model reduction in the following sense. For each instant of time t > t , 
v = v(X,t) is a function on the statistical parameter space that may be conceptualized as 
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a dynamical analogue of a minus- log-likelihood function [6, 25]. The minimizer, A = X(t), 
of v (A, t) defines the best-fit estimate of the parameter A at time t, and thus A is analogous 
to a maximum likelihood estimate. Moreover, the second-order behavior of v(X,t) in a 
neighborhood of A represents the confidence in, or information content of, the best-fit 
estimate. Thus, we may say that, just as the Liouville equation propagates the ensemble 
probability on phase space, the Hamilton- Jacobi equation (21) propagates the uncertainty 
in the reduced model, which is quantified by v. In this light, we discover that the desired 
reduced model closes at the level of the value function v on configuration space, not at 
the level of the estimated configurations themselves. This property of our best-fit closure 
reflects the fact that the value function governs both the best-fit estimate, at first order, 
and its uncertainty, at second order. 

Summarizing, we define the best-fit parameter and macrostate by 

A(f) = argminw(A,t) , a(t) = |r(A(t)) for each t>t . (22) 

A OA 

Accordingly, our best-fit closure is entirely determined by the solution of the Hamilton- 
Jacobi equation (21). Moreover, this best-fit estimation of time-dependent, nonequilib- 
rium statistical states is valid far from equilibrium. While this general result is a certainly 
satisfactory from a theoretical viewpoint, it suffers from the fact that the nonlinear par- 
tial differential equation (21) is difficult to solve except when the number m of resolved 
variables is very small. For this reason, we now proceed to derive ordinary differential 
equations for X(t), or equivalently a(t), and investigate to what extent it is possible to 
circumvent solving the Hamilton- Jacobi equation. 



4 Derivation of closed reduced dynamics 

We are primarily interested in the evolution of the best-fit parameter X(t), and correspond- 
ingly the best-fit macrostate a(t), since they determine our estimates of the expectations 
(B\p(X(t))) of any dynamical variable B. We therefore seek the system of ordinary dif- 
ferential equations for X(t). 

By definition X(t) is the minimizer for v(X,t) for each t > t , and consequently it 
satisfies 

dv - 

jx(t) = — (X(t),t) = for all t > t . (23) 
oX 

Differentiating this relation with respect to t, and invoking the Hamilton- Jacobi equation, 
we get 

d 2 v dX d 2 v 

= TTTT^ 77 + 



dXdX* dt dXdt 
d 2 v dX d „ , , , dv 



17 ~ ^(Kttt) 



dXdX* dt OX y ' dX nx=x 



dXdX* dt dX y ' ' dXdX* dp 
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where 

d 2 v ( d 2 v \ 
d\d\* ~ [dXidXi ) . . , 

denotes the Hessian matrix of v with respect to A. The second and third terms in this 
final expression are calculated from "H to be 

^(A,0) = -e 2 ^(A), ^(A,0) = C(X)~ 1 (LA \ p(X) ) . 

Solving for dX/dt, we arrive at the equation governing the reduced dynamics: 

dw 



^ = CCxr'iLAlpCX)) - e 2 



2„ n- 1 



d v , ? . 
(X,t) 



dXdX 



8A (A). P4) 



This is a first-order system of ordinary differential equations is closed in A, but it 
involves the value function v(X,t), which in turn is determined by the Hamilton- Jacobi 
equation. The first term on the right-hand side of (24) is exactly what is obtained from the 
adiabatic closure when the moment condition (R(A—a) \ p) — is imposed instantaneously 
at each time t; and thus the adiabatic closure is recovered from the best-fit closure in the 
limit as e — > 0. The dissipation, or irreversibility, in the reduced dynamics is produced by 
second term on the right hand side of (24). The magnitude of this dissipation depends 
on the adjustable parameter e, but it does not necessarily scale like e 2 , owing to the 
e-dependence of v through the Hamilton- Jacobi equation. 

We now attempt to find a governing equation for the Hessian matrix, 

M(t) = -^Cm,t) (25) 

since the inverse of this matrix enters into (24). To do so, we calculate the matrix of all 
second partial derivatives with respect to A of the Hamilton- Jacobi equation (21), and 
thereby obtain a partial differential equation for the matrix-valued function 

The result is 

dM d 2 H d 2 H „ , „ , d 2 H „ d 2 H „ , ™ &H dM 

= ~w + sasv + m^ M + M d^x- + J 'W M + £ d^w t ■ (26) 

In order to set A = A and fx — in this matrix equation, we need to evaluate the various 
coefficients in it, using the expression (20) for H. We introduce the shorthand notation 

/(A) = C(Xy 1 {LA | p(X)) , 

which coincides with dX/dt under the adiabatic closure. The required coefficients are 

f(A,0) = /(A), 
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■(A,0) = -e 2 — — -(A), 



<9A<9A* 



<9A<9A* 



d 2 U 
d/idX* 



(A,0) = C(A)- 1 , 



9A (A) 



Substituting these identities into (26) we obtain: 



9/ 

<9A 



+ MC- 1 M + /.^(A li ), 



in which all the coefficient functions are evaluated at A = X(t). Now using the governing 
equation (24) for A, which can be written as, 

§-/<X)-^£(A). 

we find that the matrix-valued function M(t) = M(X(t),t) satisfies 



dM 



df - 

OA 



dX 



- MC '.V/ + r 



d 2 w 
OXdX* 



— e 



M 



<9M 
^A~ 



(A,*), (27) 



in which again X = X(t) throughout this equation. 

Our goal is to close the differential equation (24) for the best-fit parameter A by 
coupling it to a differential equation for M. But the last term in (27) involves the partial 
derivatives dM/dX, and consequently it is not closed in M. In fact, the pair of first-order 
differential equations (24) and (27) constitute the first and second members of a closure 
hierarchy. In principle, we could generate a third member of the hierarchy by deriving 
a differential equation for dM/dX(X,t), and so forth. We will not pursue the hierarchy 
of ordinary differential equations any further, since the best-fit estimation scheme closes 
elegantly at the level of the Hamilton- Jacobi equation for the value function. 

Nonetheless, it is important to achieve an approximate closure scheme that is more 
computationally tractable that solving the Hamilton- Jacobi equation itself. Perhaps the 
most natural approximation of this kind is to set dM/dX(X, t) = identically. We call this 
the local quadratic approximation, because it amounts to replacing the solution v(X,t) of 
the Hamilton- Jacobi equation by its second-order Taylor expansion around A; that is, 

v(X,t) w v(X,t) + -(X - X(t))*M(t)(X - X(t)) for A near X(t) . 



Moreover, recalling that the value function v may be viewed as a dynamical analogue 
of a minus-log-likelihood function, we may regard this local quadratic approximation as 
comparable to a quasi-Gaussian approximation. Under this approximation, the last term 
in (27) disappears and consequently we obtain a closed first-order differential equation for 
M; namely, 



— = — AI-M 
dt dX K 1 



df (X) 

dx {X) 



-MC{Xy l M +e 5 



d 2 w 
OXdX* 



(A) 



(28) 
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We thus arrive at the desired closed reduced equations in the pair (A, M). The coupled 
pair of governing equations for the statistical closure consists of a state equation (24) for A 
and a matrix Riccati equation (28) for M. The parameter vector A determines the best-fit 
macrostate, while the matrix M characterizes the uncertainty in the best-fit estimate, up 
to the local quadratic approximation. The initial condition for (28) is 

M(t ) = dxd °^ (Ao) , where A = argminv (A) . 

We assume that the initial value function v is strictly convex, and hence that M(t ) is 
positive-definite. 

We note that the matrix solution M(t) of (28) for t > to is necessarily symmetric 
and positive-definite whenever the closure potential w is convex along the solution A. 
This conclusion follows from known properties of solutions of matrix Riccati equations, 
which hold for (28) provided that the symmetric matrices C and d 2 w/d\d\* are both 
positive-semidefinite, and the initial condition is positive-definite [26]. While the Fisher 
information matrix C(A) is positive-definite for arbitrary A, the convexity of the closure 
potential is ensured only near to equilibrium; indeed, 

( '~ <u -(A) = D(0) + O(\X\) asA^O, 



<9A<9A* 



and matrix D(X) defined in (16) is positive-semidefinite by construction. Far from equi- 
librium the closure potential could lose its convexity, however, and the solution matrix 
M could become singular. In that situation the closed reduced equations would no longer 
be well-defined. This behavior would signify a bifurcation in the best-fit parameter tra- 
jectory, which we would interpret as a dynamic phase transition in the reduced model. 



5 Near-equilibrium approximation 

In this section we present the simpler and more explicit form of best-fit quasi-equilibrium 
closure that results from applying a linear response approximation [7, 40] in a neigh- 
borhood of the statistical equilibrium state p eq . As in the preceding developments, the 
equilibrium density can be any invariant probability density for the underlying Hamilto- 
nian dynamics. For instance, it can be a canonical Gibbs ensemble: 

p eq (z) = exp(-f3[H(z)-^(f3)]), with = /T 1 log / exp(-f3H(z)) dz , 

for some inverse temperature j3 > 0. We denote the equilibrium mean of any dynamical 
variable or vector F on T n by (F) eq . 

Since the quasi-equilibrium densities (1) reduce to the equilibrium density p eq when 
A = 0, the near-equilibrium theory is the linearization of the general theory around A = 0. 
Throughout this analysis we assume that the resolved vector A is normalized relative to 
equilibrium by {A) eq = 0. We make the usual linear response approximation 

p(z-X) » [1 + X*A(z)]p eq (z). (29) 
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Under this approximation the Lagrangian C(X, A) defined in (14) becomes a quadratic 
form in the pair (A, A): 

1 . . e 2 

£(A,A) « -\\-C- x J\XC\\-C- x J\\ + — \*D\ , (30) 

where 

C = (AA*) eq , J = ((LA) A*) eq , D = ((QLA) (QLA*)) eq (31) 

The major simplification that occurs in the near equilibrium regime is that the coefficient 
matrices C, J, D defining (30) are constants determined by the equilibrium state p eq . These 
matrices have the symmetry properties: C* = C, J* = —J, D* = D. The projections P 
and Q = I — P are also independent of A, being orthogonal operators on L 2 (T n , p eq ). The 
Legendre transformation (19) and (20) now yields 

1 e 2 
H = C\-J\, H(\, //) = -fj?C- V - A* JC~ V - —\*D\ , 

in which the conjugate canonical variable, //, is linear in the pair (A, A), and the Hamil- 
tonian, H(X,/i), is a quadratic form. The closure potential w is the quadratic form, 

w(X) = -X*DX, where D = ((LA)(LA*)) eq + JC~ l J (32) 

Since the Hamiltonian is exactly quadratic, the Hamilton- Jacobi equation (21) admits 
a solution that is a quadratic form in A: 

v(X,t)=v(t) + l -(\-\(t)YM(t)(X-X(t)) 

for some m x m symmetric matrix M(t), m- vector A(t) and scalar v(t). Substitution of 
this ansatz into the Hamilton- Jacobi equation produces equations corresponding to the 
quadratic, linear and constant terms in A. Namely, 

Q= dM + ; u _ j( . , u + j _ e 2 D ^ 

dt 

= -M^ + MC-\JX - e 2 DX , 
dt 

= ^- e -X*DX. 
dt 2 

The first of these equations is a matrix Riccati equation for M . It is supplemented by 
the initial condition M(to) = Mq for a given positive-definite, symmetric matrix Mq. The 
second of these equations determines the best-fit vector A(t), and its initial condition is 
X(t ) = A . The third equation merely generates the additive constant v(t), and its initial 
condition may be set to zero, -O(to) = 0. 

It is transparent that the near-equilibrium approximation produces a best-fit closure 
theory for which the local quadratic approximation discussed in the preceding section 
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holds globally. The structure of the governing equations for the parameter vector A and 
the matrix M is the same as in the general case, given in the preceding section, except 
that the coefficient matrices are constant and the Riccati equation decouples from the 
state equation. It is worthwhile to summarize the closed reduced equations governing 
near-equilibrium: 

[C _1 J -e 2 M- l D}\ (33) 

JC L M - MC './ - MC l M + e 2 D 

The positive-definite, symmetric matrix M(t) has the interpretation as the confidence, 
or information, in the best-fit estimate A(t) at each time t. In particular, if all the 
eigenvalues of M(t) are large, then the value function v(X,t) increases rapidly away from 
its minimum point at A, meaning that the best-fit estimate is sharp with respect to the 
lack-of-fit norm for the Liouville equation. In general, the eigenvalues and associated 
eigenvectors of M(t) characterize the multivariate sensitivity of the best-fit estimate, and 
their evolution in time quantifies the propagation of uncertainty in the reduced model. The 
inverse matrix, G(t) = M(t)~ l , furnishes an alternative characterization of uncertainty 
in the sense that it is comparable to a covariance matrix for the parameter vector A. 
Moreover, as a straightforward calculation shows, the near-equilibrium closed reduced 
equations (33) have the following equivalent form in terms of A and G: 

[C _1 J -e 2 GD}\ (34) 

C-\JG - GJC- 1 - e 2 GDG + C- 1 . 

This form has the attractive feature that matrix inversion is eliminated from the closed 
reduced dynamics. Moreover, the fully specified initial condition, A(0) = A with improper 
v or Mo, is readily handled by the homogeneous initial condition G(0) = on the Riccati 
equation in (34). 

Finally, the pair of equations (34) has a strong resemblance to the equations for a 
Kalman-Bucy filter [4, 11, 12]. That is, we may view the pair of equations (34) as a state 
estimate equation for A coupled to a variance equation for G. This similarity might be 
expected from the fact that our best-fit closure scheme is derived from a dynamic opti- 
mization principle, as are filtering algorithms. However, our problem is not a standard 
filtering problem, because we are concerned with fitting a statistical model to underre- 
solved, deterministic dynamics rather than blending a stochastic dynamics with noisy 
measurements. Nonetheless, we can put our closure problem into a filtering format by 
regarding the linear mapping, A >->■ QL(\*A), which takes a state A into the unresolved 
component of the Liouville residual, as the measurement process. In this formal analogy 
the measured data is the zero function identically in time, so that the measurement error 
corresponds to the L 2 -norm of QL(X*A). The defining minimization principle for our best- 
fit closure is thereby identified with a filtering principle for a squared-norm that blends 



~di 
dM 



dX 
~di 
dG 
~dt 
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the resolved and unresolved components of the Liouville residual using a weight factor 
e 2 . This interpretation may have some utility beyond providing a conceptual connection. 
Namely, it suggests that our best-fit closure scheme could be combined naturally with 
actual continuous measurements, and in this way the estimation of the resolved variables 
could be continually improved by a hybrid of closure and filtering. But we will not pursue 
this line of development in the present paper. 



6 Simplification under time reversal symmetry 

Before discussing the role of the adjustable parameter e, which scales the dissipative effects 
in the best-fit closure, it is convenient first to display the simplified closed reduced equa- 
tions that result from symmetry with respect to time reversal. Accordingly, we consider 
the special case in which each of the resolved variables, [k — 1, . . . , to) , composing 
A is even under time reversal. In this case, and in the case when to = 1 and A = A\ is 
an arbitrary scalar variable, the near equilibrium equations can be solved explicitly and 
the thermodynamic properties of its solutions can be derived easily. Moreover, this case 
pertains to reduced models that are purely dissipative, a physically important situation 
[13]. 

Let 9 : T„ — > T n denote the time reversal operator defined by ®(q,p) = (q, —p) for 
z = (QiP) e T n . A dynamical variable F is even under time reversal if F o © = F on 
r n . We assume that the Hamiltonian and p eq are even. Then the phase flow $(£) on 
r n satisfies = 6 o $(— t) o 6. For any even dynamical variable, F, it follows that 
Fo$(t) = Fo$(-t) o0. 

For a resolved vector, A = (A±, . . . , A m ), composed of even component variables Ak, 
the matrix J = {(LA)A*) eq vanishes. This result follows directly from the fact that LA 
is odd under time reversal, which is easily seen from the identities: 

(LA) o 9 = km — — 

V ' At^O At 

Ao<S>(-At)-A , TA . 

= km ^ — = —(LA) . 

At^O At 

Tkat J = for any scalar resolved variable, A e JR 1 , is immediate from tbe skew- 
symmetry of J. 

In tke case of even resolved variables, tke closed reduced equations (34) kave a scaling 
structure witk respect to tke adjustable parameter e. In terms of tke rescaled time 
r = e(t — to), we introduce tke fundamental solution matrix, ^(r), for tke state equation 
in (34) expressed as an equation for tke macrostate a(t) = CX(t): 

^ = -e 2 CG(t;e)DC- 1 a 

Representing tke solution, a(t) = \P(ei)ao, in terms of its initial state do, and rewriting 
tke solution of tke Riccati equations in (34) in terms of tke rescaled matrix K(r) = 
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eCG(r/e; e)D, we find that (34) is equivalent to the following pair of matrix equations: 



^- = -kC~^, ^f = D-KC- 1 k. (35) 

GET GET 

These equations are supplemented by the initial conditions, ^(0) = /, K(0) = 0, which 
are appropriate to the situation when the arbitrary initial macrostate gso G M m is com- 
pletely specified; incomplete specification of the initial statistical state changes K(0). 

The pair of matrix equations (35) can be simultaneously diagonalized and hence solved 
explicitly. Let W be a matrix of normalized eigenvectors of D relative to C, and let A be 
the corresponding diagonal matrix of eigenvalues, all of which are real and nonnegative. 
The nonsingular matrix W diagonalizes D relative to C, meaning that 

W*DW = A and W*CW = I . 

Making the substitutions, *(r) = W**(r)(W*)- 1 and K(t) = W*K(r)W, in (35), we 
get the diagonalized matrix equations, 

— = -K^ , — — — A — K . 

dr gst 



Their solutions are elementary: \I/(t) = sech(v / A~T) and K(t) = v^A tanh(v / A r), where 
VA denotes the nonnegative square root of the nonnegative diagonal matrix A, and the 
hyperbolic functions act in the obvious way. Inverting the transformation, we obtain the 
desired solutions of (35); namely, 

#( r ) = (W*)- 1 sech(v / A t) W* , K(r) = (W*)- 1 VA tanh(v / A r) W' 1 (36) 

Returning to the closed reduced equations (34) expressed in unsealed time t, we now 
have a relaxation equation for the best-fit macrostate with an explicit, scaled, time- 
dependent coefficient matrix: 

f = -ek(e(t-t ))\(t). (37) 

The format of (37) is reminiscent of the linear transport equations of phenomenological 
nonequilibrium thermodynamics [13, 24, 31]. In that setting, a separation of time scales 
between the evolution of the macrostate and the fluctuations of the microstate is assumed, 
and linear relaxation equations are posited to relate fluxes to forces. In our notation the 
thermodynamic forces are —A, while the thermodynamic fluxes are da/dt. In the well- 
known Onsager theory of near-equilibrium relaxation, the matrix of transport coefficients 
is usually denoted by L, and is constant in time, so that the phenomenological equations 
are 

^ = -LA, with a(t)=CX(t). 

For a resolved vector A that is even under time reversal, the celebrated reciprocity rela- 
tions imply that L must be symmetric. The entropy production is then ds/dt = \*L\, 
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and this expression is invoked to imply that L must be positive-definite. These classical 
results rely on a number of assumptions concerning the format of the macroscopic trans- 
port equations and the statistical properties of the microscopic fluctuations, namely the 
Onsager regression hypothesis. 

By contrast, our relaxation equation (37) has a time-dependent transport matrix, 
eK(e(t — t ))) that is derived from the underlying dynamics via the best-fit model re- 
duction, up to the adjustable parameter e. Moreover, the best-fit transport matrix is 
necessarily positive-definite and symmetric by virtue of the Riccati equation [26]. Thus, 
our best-fit reduced dynamics shares the key qualitative properties of phenomenological 
irreversible thermodynamics: for the near-equilibrium, even-variable regime, it possesses 
positive entropy production and reciprocity relations. Furthermore, the derivation of the 
relaxation equation (37) from the Liouville equaiton does not require an extreme separa- 
tion of time scales. Indeed, the time dependence of K implies that there is a plateau effect 
during which K(t) increases from to its asymptotic limit, K^, which is determined by 
KooC^Koo = D. This plateau effect is regulated by the matrices C and D. More pre- 
cisely, there are m plateau time scales, Ti, . . . , r m , defined by diag {1/ti, . . . , l/r m } = y/~K, 
which give the rates at which the eigenvectors of K{t) equilibriate. Thus, we see that 
the scaled time variable r = e(t — 1 ) applies to the plateau effect, while the original time 
variable t pertains to the relaxation. 

7 Irreversibility and the parameter e 

In this section we turn our attention to the role that the adjustable parameter e plays in 
the best-fit approach to model reduction. 

Broadly speaking, a reduced model for a complex dynamics with finitely-many degrees 
of freedom may be expected to exhibit three distinct time scales: (1) the relaxation time 
scale, T r , over which the macroscopic resolved variables decay (and possibly oscillate); 
(2) a plateau time, T p , over which fluctuations in the unresolved variables influence the 
resolved variables; and (3) a memory time, T m , over which the unresolved fluctuations 
decorrelate. Moreover, these time scales generically have the ordering: T r > T p > T m . 
In fact, a strong separation of time scales, T r 3> T p 3> T m is necessary to justify a 
Markovian stochastic model of the resolved variables. In the absence of such an extreme 
separation of time scales, the reduced dynamics is described by well-known generalized 
transport equations that are derived by the Mori-Zwanzig projection method [40]. This 
formal identity has recently been employed as the starting point for statistical closure for 
underresolved Hamiltonian dynamics [8, 9, 19]. This work follows a line of development 
in nonequilibrium statistical mechanics that have been pursued by various investigators 
in the past [2, 34, 36, 35, 39, 28]. While our approach is fundamentally different than 
these works, the generalized transport equations furnish a means of interpretation of our 
adjustable parameter e and an explanation of how e is related to the relevant times scales 
in the statistical closure. 

We focus on the near-equilibrium of our theory and of the projection operator iden- 
tity. To within the linear response approximation we are given an initial ensemble 
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Po = expfApA — 0(Ao) ] Peq ~ [1 + AgA]p eg , and we are interested in the evolution of 
the ensemble-averaged macrostate a ex (t) = (e ( *~* o) M | p ) ps ( (e ( *~* o) M) A* ) eq A . The 
following Mori-Zwanzig formula, which is an exact consequence of the Liouville equation 
up to the near-equilibrium approximations, states that 



da 



ex 



dt 



= JC- l a ex - f Z(t - t') C- l a ex {t') dt' , (38) 



where Z(6) = ([ e e ® L (QLA)](QLA*) ) eq . The central difficulty faced when implementing 
this formula in practice is to find tractable approximations to the exact memory kernel 
Z(0), which involves the complementary orthogonal propagator, e 0QL , in memory time 6. 

To relate this formula to our best-fit closure, let us attach a quasi-equilibrium density 
to the exact macrostate, a ex (t), at each instant of time. That is, we define the correspond- 
ing parameter vector X ex (t) = C _1 a ex (t) and density p ex (t) = p(X ex (t)). Of course, the 
goal of our closure scheme is to approximate the exact state X ex (t) by the estimated state 
X(t), which is computed without evaluation of the memory kernel Z{6) or equivalent quan- 
tities. For the purposes of interpreting e, we examine the entropy, s ex = <f)(X ex ) — X* ex a eX) 
of the quasi-equilibrium states attached to the exact states and calculate the associated 
entropy production. From (38) we find that 



ds e 



dt 



= f X ex (t)*Z(t-t')X ex (t')dt' (39) 



Correspondingly, the entropy production for our best-fit closure in the near-equilibrium 
regime is given by 

^ = e 2 X{t)*CG{t;e)DX{t), (40) 

where G is determined by the Riccati equation in (34). Roughly speaking, the free param- 
eter e should be adjusted so that the entropy production in (40) approximately matches 
that in (39). Since the memory kernel is not accessible in the best-fit reduced model, this 
adjustment is an empirical tuning [33]. 

To proceed further with the heuristic analysis, let us suppose that the resolved variable 
A is a scalar. Then, J = 0, and C, D > 0. Recalling the discussion in the preceding 



section, the plateau time is T p = ^D/C. We write the memory kernel in the form, 
Z(9) = D((6/T m ), for a dimensionless function ((u) with ((0) = 1 and lim^oo ((u) = 0; 
in other words, Z decays with a characteristic memory time scale T m . Then assuming 
that the relaxation time scale, T r , is large compared to T m , we have 



rlo n rt i — +' r T r°° 

X ex (t)a-^)X ex (t')dt' ^^X ex {tf C(u)du. 

Jto J-m J-„ JO 



dt Jto • . ,, 

On the other hand, in this scalar case the best-fit closure gives the approximation 

ds C € c / \ o 
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supposing that the T r is large compared to T p , and hence replacing G(t; e) by its asymp- 
totic value Goo = T p /Ce. These approximate expressions for the entropy production 
agree (up to some absolute constants) provided that e ~ T m /T p . Thus, we conclude that 
e effectively sets the memory time scale, which is not evaluated in the best-fit closure 
scheme, relative to the plateau time, which is known. Moreover, a similar analysis ap- 
plied to the relaxation equation for a in (34) shows that its decay rate is approximately 
e 2 G ryo D = e/T p . Consequently, we obtain the complementary result that e ~ T p /T r . The 
fact that T p /T r ~ T m /T p is already implied by (38) when there is a separation of time 
scales. 

In essence, the single adjustable parameter e represents the minimal amount of extra 
information about memory effects that must be included into the best-fit closure scheme. 
Since e enters into the lack-of-fit Lagrangian as a scale factor for the closure potential w 
that determines the irreversible terms in the closed reduced equations, it is manifest that 
e regulates the magnitude of dissipation in the reduced model. As the discussion in this 
section shows, the appropriate value of e is related to a common ratio of the characteristic 
time scales in the model reduction problem. But, from the point of view of implemented 
computations, these rough estimates do not suffice to determine e quantatively. Rather, e 
must be estimated from some observations or simulations that are not part of the best-fit 
closure itself. 

Our presentation throughout this paper has been restricted to a single adjustable 
parameter e, and the heuristic analysis given above applies only to a single scalar resolved 
variable A. In a companion paper [33], we implement the model reduction in this form 
and investigate quantitatively how well the statistical closure scheme approximates fully 
resolved statistical solutions. In such investigations, and in other potential applications, 
the single parameter theory can be expected to perform well when it is possible to identify 
a memory time and a plateau time, at least approximately, that pertain to all the resolved 
variables. In general, when multiple time scales operate, a better approximation might 
be obtained by a more intricate construction of the lack-of-fit functional. Specifically, we 
could replace the Lagrangian (14) by 

C(X,X) = i((P A i?) 2 |p(A)> + i((E A Q A i?) 2 |p(A)), 

in which E\ is a self-adjoint, positive operator on L 2 (r„,p(A), possibly depending on A, 
with operator norm, < 1- Throughout the present paper E\ = el. By designing 

other operators E\ the resulting closure potential w might more faithfully represent the 
influence of the unresolved fluctuations on the resolved variables. The best-fit strategy 
would carry over to this generalization, but its practical implementation would require 
multiple tuned parameters. 

8 Discussion and conclusions 

The methodology presented in this paper offers a new approach to the general problem of 
constructing a reduced statistical-dynamical description of a complex Hamiltonian system. 
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Most approaches to problems of this kind in nonequilibrium statistical mechanics interpose 
a stochastic model between the given microscopic dynamics and the derived macroscopic 
dynamics, while our approach deduces an irreversible macrodynamics for the selected set 
of resolved variables directly from the deterministic and reversible microdynamics. Our 
method is based on an optimization principle for parameterized paths of trial probability 
densities on phase space: paths that achieve the least residual with respect to the Liouville 
equation determine the evolution of the estimated macrostates. This approach provides an 
information-theoretic meaning to the statistical closure, since the best-fit paths minimize a 
cost functional that measures the rate of information loss incurred by the model reduction. 
Moreover, it leads to a closure theory having a form analogous to optimal filtering theory. 
That is, the best-fit closure blends unresolved microdynamics into the reduced equations 
for the resolved macrostate via a matrix of transport coefficients in much the same way 
that the Kalman-Bucy filter assimilates continuous measurements into a stochastic state 
equation via a gain matrix. By formulating the prediction of underresolved dynamics as 
optimal estimation, the best-fit approach furnishes a new perspective on statistical closure 
of complex, chaotic, or turbulent dynamics. 

The value function for the defining optimization principle is a key ingredient in the 
best-fit closure theory. It has the units of entropy production, it represents the optimal 
cost of reduction with respect to the selected resolved variables, and it solves the Hamilton- 
Jacobi equation associated with the cost function. The best-fit macrostate corresponds to 
the evolving minimizer of the value function, while the uncertainty of the best-fit estimate 
is measured by the Hessian matrix of the value function at the minimizer. We may say that 
our Hamilton- Jacobi equation plays a role analogous to that of the Fokker-Planck equation 
for a stochastic model whose resolved variables are described by a Langevin dynamics. One 
conclusion of our work is that irreversible behavior can be derived from an optimization 
principle and its related Hamilton- Jacobi theory, without explicitly introducing diffusive 
processes. 

Criteria for the choice of the resolved variables are not considered in this paper, but 
this is an important aspect of any model reduction strategy. In some physical problems 
it may be evident that certain variables offer a thermodynamic, or kinetic, description, 
and thus are natural resolved variables. The separation of time scales between resolved 
and unresolved motions often informs this choice. More generally, though, there may be 
latitude in the selection of the resolved variables, and the design of a good model reduction 
may require examining a variety of sets of resolved variables. In the best-fit closure, a 
preferred choice would be one for which the uncertainty of the estimated macrostate, 
as measured by the second-order behavior of the value function, is relatively small. In 
principle, therefore, the theory itself offers a method of distinguishing the predictive power 
of different sets of resolved variables. We leave the issue of designing reduced descriptions, 
however, to future investigations of particular problems with the best-fit methodology. 

The framework for statistical model reduction that we propose in this work is in- 
tended to be implemented on complex, multi-scale dynamical problems. To this end, the 
local quadratic approximation of the value function in the far-from-equilibrium regime 
and the related simplification in the near-to-equilibrium regime are developed so that the 
numerical solution of the Hamilton- Jacobi equation itself is avoided. Under these approxi- 
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mations the best-fit closure theory admits a computational effective implementation, since 
sophisticated and powerful numerical methods exist for the integration of Riccati matrix 
differential equations. The best-fit approach does not require the computation of mem- 
ory kernels, which arise in approaches through Mori-Zwanzig projection methods and 
necessitate either simulations of cumbersome propagators or analytical approximations 
available only in limiting situations. Instead, the tunable parameter e introduces a single 
time-scale separation into the best-fit reduction strategy. We recognize, though, that ex- 
tensions of our basic methodology would likely involve more elaborate trial densities and 
more detailed lack-of-fit norms, which would include multiple time-scale parameters. 
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